TexAFon 2.0: A text processing tool for the generation of expressive speech in TTS applications

نویسندگان

Juan María Garrido

Yesika Laplaza

Benjamin Kolz

Miquel Cornudella

چکیده

This paper presents TexAfon 2.0, an improved version of the text processing tool TexAFon, specially oriented to the generation of synthetic speech with expressive content. TexAFon is a text processing module in Catalan and Spanish for TTS systems, which performs all the typical tasks needed for the generation of synthetic speech from text: sentence detection, pre-processing, phonetic transcription, syllabication, prosodic segmentation and stress prediction. These improvements include a new normalisation module for the standardisation on chat text in Spanish, a module for the detection of the expressed emotions in the input text, and a module for the automatic detection of the intended speech acts, which are briefly described in the paper. The results of the evaluations carried out for each module are also presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio Visual Speech Synthesis and Speech Recognition for Hindi Language

Every person in the world want to share their information, thoughts from one person to another. So communication plays very important role into that. Speech is the primary means of communication. Hindi is very popular and well known language of India. Everybody understands and speak and write easily. Our System developed for Hindi Text to Speech and Speech to Text Conversion mainly into the Hin...

متن کامل

Semantics and Discourse Processing for Expressive TTS

In this paper we present ongoing work to produce an expressive TTS reader that can be used both in text and dialogue applications. The system has been previously used to read (English) poetry and it has now been extended to apply to short stories. The text is fully analyzed both at phonetic and phonological level, and at syntactic and semantic level. The core of the system is the Prosodic Manag...

متن کامل

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Traditional Text-To-Speech (TTS) systems have been developed using especially-designed non-expressive scripted recordings. In order to develop a new generation of expressive TTS systems in the Simple4All project, real recordings from the media should be used for training new voices with a whole new range of speaking styles. However, for processing this more spontaneous material, the new systems...

متن کامل

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...

متن کامل

Refocussing on the Text No in Text-to-speech

Many Natural Language Processing applications depend crucially on the front end processes that handle the input text and transform it into a form usable by the more “sophisticated” linguistic component of the applications. Despite this crucial role, often these front end processes are considered uninteresting, yet it is not unusual for the perception of the complete application to be affected b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

TexAFon 2.0: A text processing tool for the generation of expressive speech in TTS applications

نویسندگان

چکیده

منابع مشابه

Audio Visual Speech Synthesis and Speech Recognition for Hindi Language

Semantics and Discourse Processing for Expressive TTS

A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Refocussing on the Text No in Text-to-speech

عنوان ژورنال:

اشتراک گذاری